Exploring datascience Salaries:
Comparative Analysis of USA and India
Teammates:
Harshini Sai Sangadi - 11637776
Brahmateja Reddy Mule-11648398
Venkata Surendra Reddy Sana -11733861
Ponnaganti Mahesh Chowdary -11656438
Introduction
Data Science Salaries
Data science salaries stand at the forefront of discussions within the tech industry, reflecting the
evolving demand for data-driven expertise.
Understanding the nuances of salary trends is not only crucial for professionals navigating their
career paths but also for organizations striving to attract and retain top talent.
This project delves into the realm of data science salaries, focusing on comparative analysis
between the United States and India, two prominent players in the global data science landscape.
By examining and contrasting salary data from these regions, we aim to uncover underlying
patterns, disparities, and implications for both individuals and organizations in the data science
domain.
Data Abstraction
Dataset:
The dataset utilized for this analysis consists of structured data capturing various attributes
related to data science salaries, including job titles, salaries, company locations, and possibly
additional factors such as years of experience or educational qualifications.
Attributes:
The dataset contains attributes such as job title, salary, company location, and other relevant
variables that contribute to understanding the salary landscape within the data science domain.
Number of Records:
The dataset comprises a substantial number of records, each representing an instance of a data
science job position along with associated salary information.
Source:
The dataset have been sourced from Kaggle data sources to ensure data quality and relevance to
the analysis.
Task Abstraction
Task:
The primary objective of this project is to analyze and compare data science salaries between the United States and India to
identify trends, disparities, and insights.
Target:
The target of the analysis includes understanding the distribution of salaries, identifying top-paying job titles, exploring
geographical variations in salaries, and uncovering any significant patterns or trends.
Actions:
Data Collection: Obtain relevant datasets containing information on data science salaries from both the United States and
India.
Data Preprocessing: Cleanse and prepare the datasets by handling missing values, standardizing data formats, and ensuring
data consistency.
Exploratory Data Analysis : Conduct EDA to gain initial insights into the distributions, trends, and patterns present in the data.
Visualization Creation: Create visualizations such as histograms, box plots, and bar plots to effectively communicate the salary
distributions, top-paying job titles, and geographical salary variations.
Comparative Analysis: Compare and contrast data science salaries between the United States and India to identify differences,
similarities, and any noteworthy observations.
Interpretation: Interpret the findings from the analysis to derive insights and implications for individuals and organizations
within the data science domain.
Above Workflow:
Data Collection:
Obtain datasets containing data science salary information from reliable sources for both the United States and India.
Data Preprocessing:
Cleanse the datasets by handling missing values, removing duplicates, and ensuring data consistency.
Perform data transformations such as standardizing formats, converting categorical variables to numerical, and scaling if necessary.
Exploratory Data Analysis (EDA):
Explore the datasets to gain insights into the distributions, trends, and patterns of data science salaries.
Conduct statistical analysis and visualization techniques to identify key features and outliers.
Visualization Creation:
Create visualizations such as histograms, box plots, bar plots, and scatter plots to effectively communicate salary distributions, top-paying
job titles, geographical salary variations, and salary trends over years of experience.
Comparative Analysis:
Compare data science salaries between the United States and India by examining differences, similarities, and significant observations.
Identify any disparities or trends that may exist between the two regions.
Implementation Using Tools
Python Programming Language:
Python serves as the primary programming language for data manipulation, analysis, and visualization tasks.
Pandas Library:
Pandas is utilized for data manipulation and preprocessing tasks, such as loading datasets, cleaning data, and performing transformations.
Matplotlib Library:
Matplotlib is used to create static visualizations such as histograms, box plots, and scatter plots to represent data science salary distributions, trends,
and comparisons.
Seaborn Library:
Seaborn complements Matplotlib by providing a higher-level interface for creating more visually appealing and informative statistical visualizations,
including bar plots and box plots.
Power BI:
Power BI is utilized for creating interactive dashboards and reports to visualize and explore data science salary trends dynamically.
D3.js (Before cleaning data):
D3.js is used for creating dynamic and web-based visualizations, for presenting data science salary trends.
Results for Analysis
Visualizing Top Jobs with Highest
Salaries in the US Data Science
Market Using D3.js (pre-cleaning)
This visualization showcases the top ten jobs with the
highest salaries in the US data science market.
By analyzing this data, we gain insights into the most
lucrative roles within the field, shedding light on the
career paths that offer the highest earning potential in
the United States.
Story:
As a data science enthusiast exploring career
opportunities in the United States.
I navigated through job listings, I discovers the top-
paying roles in the industry, ranging from Data
Scientists to Machine Learning Engineers.
This visualization guides in understanding the
competitive landscape and aligning my career
aspirations with the prevailing market trends.
Top Companies in India
by Data Science Salaries
Shifting our focus to India, the second visualization
highlights the top ten companies offering the highest
data science salaries.
Through this analysis, we uncover the organizations
leading the charge in compensating data science
talent within the Indian market, providing valuable
insights for job seekers and industry professionals.
Story:
Job seekers identify the top companies known for
their generous compensation packages in the data
science domain.
With this knowledge, individuals can strategically
target their job search and aim for positions in these
prestigious organizations.
Distribution of Salaries by
Employment Type in the US
Data Science Market
This third visualization presents the distribution of
salaries by employment type in the US data science
market.
By categorizing salaries based on employment
arrangements, such as full-time, part-time, or
contract, this visualization offers a comprehensive
understanding of the compensation structures
prevailing in the industry.
Story:
Individuals explore various employment options
available in the US data science landscape.
Through this visualization, they gain insights into the
salary distribution across different employment types,
enabling them to make informed decisions regarding
their career trajectory and preferred work
arrangements.
Distribution of Salaries by Work
Models in the US Data Science
Market
This fourth visualization delves into the distribution of
salaries by work models prevalent in the US data
science market.
By analyzing salary trends across different work
models, such as remote work, freelance, or on-site
employment, this visualization provides valuable
insights into the evolving nature of work
arrangements within the data science industry.
Story:
As individuals prepare to embark on their data science
career journey, they explore various work models
available in the market.
Through this visualization, they gain a deeper
understanding of how different work arrangements
impact salary structures, empowering them to choose
the work model that best aligns with their preferences
and lifestyle.
Visualizing After
Cleaning Dataset Using
Python
Visual Explanation:
The histogram shows how salaries are distributed among the
highest-paying data science jobs in India.
Each bar represents a salary range, and the taller the bar, the
more salaries fall within that range.
The smooth curve overlaid on the bars gives a general idea of
how salaries are spread out across the ranges.
Story:
Imagine a group of data enthusiasts eager to explore job
opportunities in India's tech sector. They stumble upon a
histogram showcasing the top salaries offered in data science
roles.
As they glance at the graph, they see different salary ranges,
from lower-paying positions to those offering substantial
compensation packages.
By looking at the heights of the bars, they notice where salary
concentrations are highest, giving them an idea of the most
common salary ranges in the industry.
This insight helps them understand the earning potential in the
field and guides their career decisions.
Exploring Top Data Science
Salaries and Job Titles in India
Visual Explanation:
The box plot displays the distribution of salaries among the top
10 job titles in India's data science sector.
Each box represents the salary range for a specific job title,
with the median indicated by the line inside the box.
The whiskers extend to the minimum and maximum salaries,
while any outliers are plotted individually.
Story:
Inquisitive minds intrigued by the dynamics of India's data
science job market stumble upon a box plot showcasing salary
distributions across the top 10 job titles.
As they observe the graph, they notice a series of boxes, each
representing a job title, and the spread of salaries associated
with it.
By examining the position of the median line within each box,
they gain insight into the typical salary range for each job title.
The length of the whiskers indicates the variability of salaries
within each role, with outliers providing additional context on
exceptionally high or low salary offers.
Exploring Top Data Science
Salaries Across Locations in India
Visual Explanation:
The bar plot illustrates the top 10 locations in India by data
science salaries. Each bar represents a location, with the length
indicating the average salary offered in that particular area.
The colors of the bars provide visual differentiation, while the
absence of error bars indicates no confidence intervals are
plotted.
Story:
A group of individuals intrigued by the geographical variations
in data science salaries in India comes across a bar plot
showcasing salary distributions across the top 10 locations.
As they examine the graph, they notice bars of varying lengths,
each representing a different location and the average salary
associated with it.
By comparing the lengths of the bars, they discern which
locations offer higher average salaries within the data science
domain.
This insight enables them to consider factors such as cost of
living and job market demand when evaluating potential career
opportunities across different regions.
Exploring Top Data
ScienceSalaries and Years of
Experience in India
Visual Explanation:
The scatter plot depicts the relationship between salaries and
years of experience in India's data science sector.
Each point represents a reported salary, with its position on the
x-axis indicating the years of experience associated with it.
The color of the points adds visual distinction, with an orange
hue chosen for clarity.
Story:
Enthusiasts delving into the intricacies of data science careers
in India stumble upon a scatter plot showcasing the correlation
between salaries and years of experience.
As they delve into the graph, they observe a myriad of points
scattered across the plot, each representing a reported salary
linked to a specific level of experience.
By examining the distribution of points, they discern patterns
that reveal how salaries evolve with increasing years of
experience.
This insight aids them in understanding the typical salary
trajectories within the data science industry and serves as a
valuable reference for career planning and salary negotiation.
Exploring Top Data
Science Salaries in the USA
Visual Explanation:
The histogram illustrates the distribution of the top 10 salaries
within the data science sector in the USA.
Each bar represents a salary range, with the height indicating
the frequency of salaries within that range.
The presence of a kernel density estimate (KDE) overlay
provides a smoothed representation of the data distribution.
Story:
Data enthusiasts intrigued by the landscape of data science
careers in the USA come across a histogram showcasing the
distribution of the top 10 salaries.
As they delve into the graph, they observe a series of bars, each
representing a salary range, and the corresponding frequency
of salaries falling within those ranges.
By examining the heights of the bars, they gain insights into the
prevalence of specific salary bands within the data science
industry in the USA.
The KDE overlay offers additional context, providing a
smoothed representation of the data distribution and
highlighting any underlying patterns.
Analyzing Top Data Science
Salaries Across Job Titles in the
USA
Visual Explanation:
The box plot showcases the distribution of the top 10 salaries
among the top 10 job titles in the USA's data science sector.
Each box represents the salary range for a specific job title,
with the median marked by a line inside the box. The color
palette chosen enhances visual contrast between different job
titles.
Story:
Explorers delving into the nuances of data science careers in
the USA stumble upon a box plot depicting salary distributions
across the top 10 job titles.
As they delve into the visualization, they encounter a series of
boxes, each representing a job title, and the spread of salaries
associated with it.
By observing the position of the median line within each box,
they gain insight into the typical salary range for each job title.
The length of the whiskers indicates the variability of salaries
within each role, with outliers providing additional context on
exceptionally high or low salary offers.
Exploring Top Data Science
Salaries Across Company
Locations in the USA
Visual Explanation:
The bar plot illustrates the top 10 locations in the USA by data
science salaries.
Each bar represents a location, with the length indicating the
average salary offered in that particular area.
The coolwarm palette enhances visual differentiation between
different locations.
Story:
Curious minds intrigued by the geographical distribution of data
science salaries in the USA stumble upon a bar plot showcasing
salary distributions across the top 10 company locations.
As they delve into the visualization, they encounter bars of
varying lengths, each representing a different location and the
average salary associated with it.
By comparing the lengths of the bars, they discern which
locations offer higher average salaries within the data science
domain.
This insight enables them to consider factors such as cost of
living and job market demand when evaluating potential career
opportunities across different regions.
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Work Management:
Work completed:
Every member of the team worked on all parts of the project, but we segregated
each part responsible for one member of the team. Everyone took the responsibility
to get every update on that part. So, we completed the tasks on time as expected.
Responsibility & Contributions :
Harshini Sangadi: Implementation (25%)
Brahmateja Reddy Mule: Analysis (25%)
Venkata Surendra Reddy Sana : Design (25%)
Ponnaganti Mahesh Chowdary: Documentation (25%)
References
1. McKinney, W. (2010). Data Structures for Statistical Computing in Python.
Proceedings of the 9th Python in Science Conference.
2. Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in
Science & Engineering, 9(3), 90-95.
3. Waskom, M. (2022). seaborn: statistical data visualization. Journal of Open
Source Software, 7(77), 3021.
4. Kluyver, T., et al. (2016). Jupyter Notebooks a publishing format for
reproducible computational workflows. In Loizides, F., & Schmidt, B. (Eds.),
Positioning and Power in Academic Publishing: Players, Agents and Agendas.
5. Microsoft. (n.d.). Power BI. Retrieved from https://powerbi.microsoft.com/.